Period-to-period analysis of ac signals from nanopore sequencing

ABSTRACT

An alternating signal is applied across a nanopore of a sequencing cell, the nanopore being configured to receive a tag that is connected to a nucleotide, thereby creating a threading event. A first set of voltage data is acquired during a first portion of a plurality of cycles of the alternating signal. Each data point of the first set of voltage data corresponds to a value of a resistance of the nanopore at a different time, where the resistance of the nanopore changes when the tag is received within the nanopore. A shifted set of voltage data is determined from the first set of voltage data and difference data is computed by computing differences between data points of the first set of voltage data and corresponding data points of the shifted set of voltage data. Threading events may be identified based on the difference data.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/628,353 filed Jun. 20, 2017, which claims priority to U.S.Provisional Patent Application 62/354,106 filed Jun. 23, 2016, thedisclosures of which are incorporated by reference in their entiretiesfor all purposes.

BACKGROUND

Nanopore membrane devices having pore sizes on the order of onenanometer in internal diameter have shown promise in rapid nucleotidesequencing. When a voltage signal is applied across a nanopore immersedin a conducting fluid, the electric field can move ions in theconducting fluid through the nanopore. The movement of ions in theconducting fluid through the nanopore can cause a small ion current. Thevoltage applied can also move the molecules to be sequenced into,through, or out of the nanopore. The level of the ion current (or acorresponding voltage) depends on the sizes and chemical structures ofthe nanopore and the particular molecule that has been moved into thenanopore.

As an alternative to a DNA molecule (or other nucleic acid molecule tobe sequenced) moving through the nanopore, a molecule (e.g., anucleotide being added to a DNA strand) can include a particular tag ofa particular size and/or structure. The ion current or a voltage in acircuit including the nanopore (e.g., at an integrating capacitor) canbe measured as a way of measuring the resistance of the nanoporecorresponding to the molecule, thereby allowing the detection of theparticular molecule in the nanopore, and the particular nucleotide at aparticular position of a nucleic acid.

A nanopore based sequencing chip may be used for DNA sequencing. Ananopore based sequencing chip can incorporate a large number of sensorcells configured as an array. For example, an array of one million cellsmay include 1000 rows by 1000 columns of cells.

The voltages that are measured can vary from chip to chip and from cellto cell of a same chip due to manufacturing variability. Therefore, itcan be difficult to determine the correct molecule, which may be orcorrespond to the correct nucleotide in a particular nucleic acid orother polymer in a cell.

Accordingly, improved techniques are desired for sequencing.

BRIEF SUMMARY

Embodiments can provide systems, methods, and apparatuses forsignal-processing and base-calling for primary analysis of nanoporesequencing using an AC waveform applied to the nanopore. A differencingtechnique can be used.

In some embodiments, a differencing scheme can be 1-period or n-periodsbased. It is also possible to do multiple differencing schemes forlocating events in time, where an event can correspond to a nucleotidebeing added to a nucleic acid, as may be detected with a tag attached tothe nucleotide moving into the nanopore. Different measures liketime-to-thread (TTT), dwell times, etc. can be obtained.

Other embodiments are directed to systems, portable consumer devices,and computer readable media associated with methods described herein.

A better understanding of the nature and advantages of embodiments ofthe present invention may be gained with reference to the followingdetailed description and the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a top view of an embodiment of a nanopore sensor chip havingan array of nanopore cells.

FIG. 2 illustrates an embodiment of a nanopore cell in a nanopore sensorchip that can be used to characterize a polynucleotide or a polypeptide.

FIG. 3 illustrates an embodiment of a nanopore cell performingnucleotide sequencing using a nanopore-based sequencing-by-synthesis(Nano-SBS) technique.

FIG. 4 illustrates an embodiment of an electric circuit in a nanoporecell.

FIG. 5 shows example data points captured from a nanopore cell duringbright periods and dark periods of AC cycles, according to certainaspects of the present disclosure.

FIG. 6 shows sample data that illustrates the periodicity of voltagedata, according to certain aspects of the present disclosure.

FIG. 7 illustrates shifting of voltage data for determining differencedata, where the voltage data has one threading event, according tocertain aspects of the present disclosure.

FIG. 8 illustrates shifting of voltage data for determining differencedata, where the voltage data has three threading events, according tocertain aspects of the present disclosure.

FIG. 9 illustrates a schematic diagram showing examples of time shiftsof voltage data, according to certain aspects of the present disclosure.

FIG. 10 shows examples of parameters that may be computed fromdifference data, according to certain aspects of the present disclosure.

FIG. 11 is a flow chart illustrating an example method ofperiod-to-period analysis of AC signals from nanopore sequencing,according to certain aspects of the present disclosure.

FIGS. 12A and 12B show sample data showing a comparison of processeddifference data and raw-ADC data showing events, according to certainaspects of the present disclosure.

FIGS. 13A and 13B show sample data showing a comparison of processedsignals from the proposed method and raw-ADC signals showing threadingevents, according to certain aspects of the present disclosure.

FIGS. 14A and 14B show sample data showing a comparison of processedsignals from the proposed method and raw-ADC signals showing threadingevents, according to certain aspects of the present disclosure.

FIG. 15 shows a block diagram of an example computer system usable withsystems and methods, according to certain aspects of the presentdisclosure.

TERMS

Unless defined otherwise, technical and scientific terms used hereinhave the same meaning as commonly understood by a person of ordinaryskill in the art. Methods, devices, and materials similar or equivalentto those described herein can be used in the practice of disclosedtechniques. The following terms are provided to facilitate understandingof certain terms used frequently and are not meant to limit the scope ofthe present disclosure. Abbreviations used herein have theirconventional meaning within the chemical and biological arts.

“Nucleic acid” may refer to deoxyribonucleotides or ribonucleotides andpolymers thereof in either single- or double-stranded form. The term mayencompass nucleic acids containing known nucleotide analogs or modifiedbackbone residues or linkages, which are synthetic, naturally occurring,and non-naturally occurring, which have similar binding properties asthe reference nucleic acid, and which are metabolized in a mannersimilar to the reference nucleotides. Examples of such analogs mayinclude, without limitation, phosphorothioates, phosphoramidites, methylphosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides,peptide-nucleic acids (PNAs). Unless otherwise indicated, a particularnucleic acid sequence also implicitly encompasses conservativelymodified variants thereof (e.g., degenerate codon substitutions) andcomplementary sequences, as well as the sequence explicitly indicated.Specifically, degenerate codon substitutions may be achieved bygenerating sequences in which the third position of one or more selected(or all) codons is substituted with mixed-base and/or deoxyinosineresidues (Batzer et al., Nucleic Acid Res. 19:5081 (1991); Ohtsuka etal., J. Biol. Chem. 260:2605-2608 (1985); Rossolini et al., Mol. Cell.Probes 8:91-98 (1994)). The term nucleic acid may be usedinterchangeably with gene, cDNA, mRNA, oligonucleotide, andpolynucleotide.

The term “template” may refer to a single stranded nucleic acid moleculethat is copied into a complementary strand of DNA nucleotides for DNAsynthesis. In some cases, a template may refer to the sequence of DNAthat is copied during the synthesis of mRNA.

The term “primer” may refer to a short nucleic acid sequence thatprovides a starting point for DNA synthesis. Enzymes that catalyze theDNA synthesis, such as DNA polymerases, can add new nucleotides to aprimer for DNA replication.

“Polymerase” refers to an enzyme that performs template-directedsynthesis of polynucleotides. The term encompasses both a full lengthpolypeptide and a domain that has polymerase activity. DNA polymerasesare well-known to those skilled in the art, and include but are notlimited to DNA polymerases isolated or derived from Pyrococcus furiosus,Thermococcus litoralis, and Thermotoga maritime, or modified versionsthereof. They include both DNA-dependent polymerases and RNA-dependentpolymerases such as reverse transcriptase. At least five families ofDNA-dependent DNA polymerases are known, although most fall intofamilies A, B and C. There is little or no sequence similarity among thevarious families. Most family A polymerases are single chain proteinsthat can contain multiple enzymatic functions including polymerase, 3′to 5′ exonuclease activity and 5′ to 3′ exonuclease activity. Family Bpolymerases typically have a single catalytic domain with polymerase and3′ to 5′ exonuclease activity, as well as accessory factors. Family Cpolymerases are typically multi-subunit proteins with polymerizing and3′ to 5′ exonuclease activity. In E. coli, three types of DNApolymerases have been found, DNA polymerases I (family A), II (familyB), and III (family C). In eukaryotic cells, three different family Bpolymerases, DNA polymerases α, δ, and ε, are implicated in nuclearreplication, and a family A polymerase, polymerase y, is used formitochondrial DNA replication. Other types of DNA polymerases includephage polymerases. Similarly, RNA polymerases typically includeeukaryotic RNA polymerases I, II, and III, and bacterial RNA polymerasesas well as phage and viral polymerases. RNA polymerases can beDNA-dependent and RNA-dependent.

“Nanopore” refers to a pore, channel or passage formed or otherwiseprovided in a membrane. A membrane can be an organic membrane, such as alipid bilayer, or a synthetic membrane, such as a membrane formed of apolymeric material. The nanopore can be disposed adjacent or inproximity to a sensing circuit or an electrode coupled to a sensingcircuit, such as, for example, a complementary metal oxide semiconductor(CMOS) or field effect transistor (FET) circuit. In some examples, ananopore has a characteristic width or diameter on the order of 0.1nanometers (nm) to about 1000 nm. Some nanopores are proteins.

“Nucleotide,” in addition to referring to the naturally occurringribonucleotide or deoxyribonucleotide monomers, can be understood torefer to related structural variants thereof, including derivatives andanalogs, that are functionally equivalent with respect to the particularcontext in which the nucleotide is being used (e.g., hybridization to acomplementary base), unless the context clearly indicates otherwise.

“Tag” refers to a detectable moiety that can be atoms or molecules, or acollection of atoms or molecules. A tag can provide an optical,electrochemical, magnetic, or electrostatic (e.g., inductive,capacitive) signature, which signature may be detected with the aid of ananopore. Typically, when a nucleotide is attached to the tag it iscalled a “Tagged Nucleotide.” The tag can be attached to the nucleotidevia the phosphate moiety.

As used herein, the term “bright period” may generally refer to the timeperiod when a tag of a tagged nucleotide is forced into a nanopore by anelectric field applied through an AC signal. The term “dark period” maygenerally refer to the time period when a tag of a tagged nucleotide ispushed out of the nanopore by the electric field applied through the ACsignal. An AC cycle may include the bright period and the dark period.In different embodiments, the polarity of the voltage signal applied toa nanopore cell to put the nanopore cell into the bright period (or thedark period) may be different.

DETAILED DESCRIPTION

Certain methods of primary-analysis use a large number of parameters forsignal processing of AC data (resulting from application of analternating voltage) and base-calling for processed data generated bynanopore sequencing, where an AC signal is applied to the nanopore.Using a large number of parameters can be slow, noisy, and non-robustdue to having large number of parameters, particularly when firstapplied to new system. Such filtering can also introduce its ownartifacts (depending on filter(s) used and their parameters), canamplify noise and propagate errors to base-calling and alignment. Theperformance (speed, sensitivity, accuracy etc.) of primary-analysis isimportant.

In some embodiments, differences are determined between correspondingvoltage measurements for a cycle of the AC data, and the difference datais analyzed to identify threading events of a tag in the nanopore, wherethe tag corresponds to a particular nucleotide. Some advantages caninclude: it is simple, fast, robust (does not require many or anyexternal parameters), does not need any filtering, can also be easilyimplemented in FPGA and GPU, and can successfully eliminate most noise(other than ADC-noise). In addition, subsequent base-calling can be doneon integrated data making it less sensitive to noise. Since embodimentscan use a differencing of corresponding points from adjacent orotherwise nearby cycles (also referred to herein as local neighborhood),embodiment can be adaptive to local systematic variations in the rawdata, e.g., correct for gain drift baseline shift or the like.

I. Nanopore Based Sequencing Chip

FIG. 1 is a top view of an embodiment of a nanopore sensor chip 100having an array 140 of nanopore cells 150. Each nanopore cell 150includes a control circuit integrated on a silicon substrate of nanoporesensor chip 100. In some embodiments, side walls 136 may be included inarray 140 to separate groups of nanopore cells 150 so that each groupmay receive a different sample for characterization. Each nanopore cellmay be used to sequence a nucleic acid. In some embodiments, nanoporesensor chip 100 may include a cover plate 130. In some embodiments,nanopore sensor chip 100 may also include a plurality of pins 110 forinterfacing with other circuits, such as a computer processor.

In some embodiments, nanopore sensor chip 100 may include multiple chipsin a same package, such as, for example, a Multi-Chip Module (MCM) orSystem-in-Package (SiP). The chips may include, for example, a memory, aprocessor, a field-programmable gate array (FPGA), anapplication-specific integrated circuit (ASIC), data converters, ahigh-speed I/O interface, etc.

In some embodiments, nanopore sensor chip 100 may be coupled to (e.g.,docked to) a nanochip workstation 120, which may include variouscomponents for carrying out (e.g., automatically carrying out) variousembodiments of the processes disclosed herein, including, for example,analyte delivery mechanisms, such as pipettes for delivering lipidsuspension or other membrane structure suspension, analyte solution,and/or other liquids, suspension or solids, robotic arms, computerprocessor, and/or memory. A plurality of polynucleotides may be detectedon array 140 of nanopore cells 150. In some embodiments, each nanoporecell 150 can be individually addressable.

II. Nanopore Sequencing Cell

Nanopore cells 150 in nanopore sensor chip 100 may be implemented inmany different ways. For example, in some embodiments, tags of differentsizes and/or chemical structures may be attached to differentnucleotides in a nucleic acid molecule to be sequenced. In someembodiments, a complementary strand to a template of the nucleic acidmolecule to be sequenced may be synthesized by hybridizing differentlypolymer-tagged nucleotides with the template. In some implementations,the nucleic acid molecule and the attached tags may both move throughthe nanopore, and an ion current passing through the nanopore mayindicate the nucleotide that is in the nanopore because of theparticular size and/or structure of the tag attached to the nucleotide.In some implementations, only the tags may be moved into the nanopore.There may also be many different ways to detect the different tags inthe nanopores.

A. Nanopore Sequencing Cell Structure

FIG. 2 illustrates an embodiment of an example nanopore cell 200 in ananopore sensor chip, such as nanopore cell 150 in nanopore sensor chip100 of FIG. 1, that can be used to characterize a polynucleotide or apolypeptide. Nanopore cell 200 may include a well 205 formed ofdielectric layers 201 and 204; a membrane, such as a lipid bilayer 214formed over well 205; and a sample chamber 215 on lipid bilayer 214 andseparated from well 205 by lipid bilayer 214. Well 205 may contain avolume of electrolyte 206, and sample chamber 215 may hold bulkelectrolyte 208 containing a nanopore, e.g., a soluble protein nanoporetransmembrane molecular complexes (PNTMC), and the analyte of interest(e.g., a nucleic acid molecule to be sequenced).

Nanopore cell 200 may include a working electrode 202 at the bottom ofwell 205 and a counter electrode 210 disposed in sample chamber 215. Asignal source 228 may apply a voltage signal between working electrode202 and counter electrode 210. A single nanopore (e.g., a PNTMC) may beinserted into lipid bilayer 214 by an electroporation process caused bythe voltage signal, thereby forming a nanopore 216 in lipid bilayer 214.The individual membranes (e.g., lipid bilayers 214 or other membranestructures) in the array may be neither chemically nor electricallyconnected to each other. Thus, each nanopore cell in the array may be anindependent sequencing machine, producing data unique to the singlepolymer molecule associated with the nanopore that operates on theanalyte of interest and modulates the ionic current through theotherwise impermeable lipid bilayer.

As shown in FIG. 2, nanopore cell 200 may be formed on a substrate 230,such as a silicon substrate. Dielectric layer 201 may be formed onsubstrate 230. Dielectric material used to form dielectric layer 201 mayinclude, for example, glass, oxides, nitrides, and the like. An electriccircuit 222 for controlling electrical stimulation and for processingthe data detected from nanopore cell 200 may be formed on substrate 230and/or within dielectric layer 201. For example, a plurality ofpatterned metal layers (e.g., metal 1 to metal 6) may be formed indielectric layer 201, and a plurality of active devices (e.g.,transistors) may be fabricated on substrate 230. In some embodiments,signal source 228 is included as a part of electric circuit 222.Electric circuit 222 may include, for example, amplifiers, integrators,analog-to-digital converters, noise filters, feedback control logic,and/or various other components. Electric circuit 222 may be furthercoupled to a processor 224 that is coupled to a memory 226, whereprocessor 224 can analyze the sequencing data to determine sequences ofthe polymer molecules that have been sequenced in the array.

Working electrode 202 may be formed on dielectric layer 201, and mayform at least a part of the bottom of well 205. In some embodiments,working electrode 202 is a metal electrode. For non-faradaic conduction,working electrode 202 may be made of metals or other materials that areresistant to corrosion and oxidation, such as, for example, platinum,gold, titanium nitride, and graphite. For example, working electrode 202may be a platinum electrode with electroplated platinum. In anotherexample, working electrode 202 may be a titanium nitride (TiN) workingelectrode. Working electrode 202 may be porous, thereby increasing itssurface area and a resulting capacitance associated with workingelectrode 202. Because the working electrode of a nanopore cell may beindependent from the working electrode of another nanopore cell, theworking electrode may be referred to as cell electrode in thisdisclosure.

Dielectric layer 204 may be formed above dielectric layer 201.Dielectric layer 204 forms the walls surrounding well 205. Dielectricmaterial used to form dielectric layer 204 may include, for example,glass, oxide, silicon mononitride (SiN), polyimide, or other suitablehydrophobic insulating material . The top surface of dielectric layer204 may be silanized. The silanization may form a hydrophobic layer 220above the top surface of dielectric layer 204. In some embodiments,hydrophobic layer 220 has a thickness of about 1.5 nanometer (nm).

Well 205 formed by the walls of dielectric layer 204 includes volume ofelectrolyte 206 above working electrode 202. Volume of electrolyte 206may be buffered and may include one or more of the following: lithiumchloride (LiCl), sodium chloride (NaCl), potassium chloride (KCl),lithium glutamate, sodium glutamate, potassium glutamate, lithiumacetate, sodium acetate, potassium acetate, calcium chloride (CaCl₂),strontium chloride (SrCl₂), manganese chloride (MnCl₂), and magnesiumchloride (MgCl₂). In some embodiments, volume of electrolyte 206 has athickness of about three microns (μm).

As also shown in FIG. 2, a membrane may be formed on top of dielectriclayer 204 and span across well 205. In some embodiments, the membranemay include a lipid monolayer 218 formed on top of hydrophobic layer220. As the membrane reaches the opening of well 205, lipid monolayer218 may transition to lipid bilayer 214 that spans across the opening ofwell 205. The lipid bilayer may comprise or consist of phospholipid, forexample, selected from diphytanoyl-phosphatidylcholine (DPhPC),1,2-diphytanoyl-sn-glycero-3-phosphocholine,1,2-Di-O-Phytanyl-sn-Glycero-3-phosphocholine (DoPhPC),palmitoyl-oleoyl-phosphatidylcholine (POPC),dioleoyl-phosphatidyl-methylester (DOPME),dipalmitoylphosphatidylcholine (DPPC), phosphatidylcholine,phosphatidylethanolamine, phosphatidylserine, phosphatidic acid,phosphatidylinositol, phosphatidylglycerol, sphingomyelin,1,2-di-O-phytanyl-sn-glycerol;1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethyleneglycol)-350];1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethyleneglycol)-550];1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethyleneglycol)-750];1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethyleneglycol)-1000];1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethyleneglycol)-2000]; 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine-N-lactosyl;GM1 Ganglioside, Lysophosphatidylcholine (LPC) or any combinationthereof.

As shown, lipid bilayer 214 is embedded with a single nanopore 216,e.g., formed by a single PNTMC. As described above, nanopore 216 may beformed by inserting a single PNTMC into lipid bilayer 214 byelectroporation. Nanopore 216 may be large enough for passing at least aportion of the analyte of interest and/or small ions (e.g., Na⁺, K⁺,Ca²⁺, CI⁻) between the two sides of lipid bilayer 214.

Sample chamber 215 is over lipid bilayer 214, and can hold a solution ofthe analyte of interest for characterization. The solution may be anaqueous solution containing bulk electrolyte 208 and buffered to anoptimum ion concentration and maintained at an optimum pH to keep thenanopore 216 open. Nanopore 216 crosses lipid bilayer 214 and providesthe only path for ionic flow from bulk electrolyte 208 to workingelectrode 202. In addition to nanopores (e.g., PNTMCs) and the analyteof interest, bulk electrolyte 208 may further include one or more of thefollowing: lithium chloride (LiCl), sodium chloride (NaCl), potassiumchloride (KCl), lithium glutamate, sodium glutamate, potassiumglutamate, lithium acetate, sodium acetate, potassium acetate, calciumchloride (CaCl₂), strontium chloride (SrCl₂), Manganese chloride(MnCl₂), and magnesium chloride (MgCl₂).

Counter electrode (CE) 210 may be an electrochemical potential sensor.In some embodiments, counter electrode 210 may be shared between aplurality of nanopore cells, and may therefore be referred to as acommon electrode. In some cases, the common potential and the commonelectrode may be common to all nanopore cells, or at least all nanoporecells within a particular grouping. The common electrode can beconfigured to apply a common potential to the bulk electrolyte 208 incontact with the nanopore 216. Counter electrode 210 and workingelectrode 202 may be coupled to signal source 228 for providingelectrical stimulus (e.g., voltage bias) across lipid bilayer 214, andmay be used for sensing electrical characteristics of lipid bilayer 214(e.g., resistance, capacitance, and ionic current flow). In someembodiments, nanopore cell 200 can also include a reference electrode212.

In some embodiments, various checks can be made during creation of thenanopore cell as part of calibration. Once a nanopore cell is created,further calibration steps can be performed, e.g., to identify nanoporecells that are performing as desired (e.g., one nanopore in the cell).Such calibration checks can include physical checks, voltagecalibration, open channel calibration, and identification of cells witha single nanopore.

B. Detection Signals of Nanopore Sequencing Cell

Nanopore cells in nanopore sensor chip, such as nanopore cells 150 innanopore sensor chip 100, may enable parallel sequencing using a singlemolecule nanopore-based sequencing by synthesis (Nano-SBS) technique.

FIG. 3 illustrates an embodiment of a nanopore cell 300 performingnucleotide sequencing using the Nano-SBS technique. In the Nano-SBStechnique, a template 332 to be sequenced (e.g., a nucleotide acidmolecule or another analyte of interest) and a primer may be introducedinto bulk electrolyte 308 in the sample chamber of nanopore cell 300. Asexamples, template 332 can be circular or linear. A nucleic acid primermay be hybridized to a portion of template 332 to which four differentlypolymer-tagged nucleotides 338 may be added.

In some embodiments, an enzyme (e.g., a polymerase 334, such as a DNApolymerase) may be associated with nanopore 316 for use in thesynthesizing a complementary strand to template 332. For example,polymerase 334 may be covalently attached to nanopore 316. Polymerase334 may catalyze the incorporation of nucleotides 338 onto the primerusing a single stranded nucleic acid molecule as the template.Nucleotides 338 may comprise tag species (“tags”) with the nucleotidebeing one of four different types: A, T, G, or C. When a taggednucleotide is correctly complexed with polymerase 334, the tag may bepulled (loaded) into the nanopore by an electrical force, such as aforce generated in the presence of an electric field generated by avoltage applied across lipid bilayer 314 and/or nanopore 316. The tailof the tag may be positioned in the barrel of nanopore 316. The tag heldin the barrel of nanopore 316 may generate a unique ionic blockadesignal 340 due to the tag's distinct chemical structure and/or size,thereby electronically identifying the added base to which the tagattaches.

As used herein, a “loaded” or “threaded” tag may be one that ispositioned in and/or remains in or near the nanopore for an appreciableamount of time, e.g., 0.1 millisecond (ms) to 10000 ms. In some cases, atag is loaded in the nanopore prior to being released from thenucleotide. In some instances, the probability of a loaded tag passingthrough (and/or being detected by) the nanopore after being releasedupon a nucleotide incorporation event is suitably high, e.g., 90% to99%.

In some embodiments, before polymerase 334 is connected to nanopore 316,the conductance of nanopore 316 may be high, such as, for example, about300 picosiemens (300 pS). As the tag is loaded in the nanopore, a uniqueconductance signal (e.g., signal 340) is generated due to the tag'sdistinct chemical structure and/or size. For example, the conductance ofthe nanopore can be about 60 pS, 80 pS, 100 pS, or 120 pS, eachcorresponding to one of the four types of tagged nucleotides. Thepolymerase may then undergo an isomerization and a transphosphorylationreaction to incorporate the nucleotide into the growing nucleic acidmolecule and release the tag molecule.

In some cases, some of the tagged nucleotides may not match(complementary bases) with a current position of the nucleic acidmolecule (template). The tagged nucleotides that are not base-pairedwith the nucleic acid molecule may also pass through the nanopore. Thesenon-paired nucleotides can be rejected by the polymerase within a timescale that is shorter than the time scale for which correctly pairednucleotides remain associated with the polymerase. Tags bound tonon-paired nucleotides may pass through the nanopore quickly, and bedetected for a short period of time (e.g., less than 10 ms), while tagsbounded to paired nucleotides can be loaded into the nanopore anddetected for a long period of time (e.g., at least 10 ms). Therefore,non-paired nucleotides may be identified by a downstream processor basedat least in part on the time for which the nucleotide is detected in thenanopore.

A conductance (or equivalently the resistance) of the nanopore includingthe loaded (threaded) tag can be measured via a current passing throughthe nanopore, thereby providing an identification of the tag species andthus the nucleotide at the current position. In some embodiments, adirect current (DC) signal can be applied to the nanopore cell (e.g., sothat the direction at which the tag moves through the nanopore is notreversed). However, operating a nanopore sensor for long periods of timeusing a direct current can change the composition of the electrode,unbalance the ion concentrations across the nanopore, and have otherundesirable effects that can affect the lifetime of the nanopore cell.Applying an alternating current (AC) waveform can reduce theelectro-migration to avoid these undesirable effects and have certainadvantages as described below. The nucleic acid sequencing methodsdescribed herein that utilize tagged nucleotides are fully compatiblewith applied AC voltages, and therefore an AC waveform can be used toachieve these advantages.

The ability to re-charge the electrode during the AC detection cycle canbe advantageous when sacrificial electrodes, electrodes that changemolecular character in the current-carrying reactions (e.g., electrodescomprising silver), or electrodes that change molecular character incurrent-carrying reactions are used. An electrode may deplete during adetection cycle when a direct current signal is used. The recharging canprevent the electrode from reaching a depletion limit, such as becomingfully depleted, which can be a problem when the electrodes are small(e.g., when the electrodes are small enough to provide an array ofelectrodes having at least 500 electrodes per square millimeter).Electrode lifetime in some cases scales with, and is at least partlydependent on, the width of the electrode.

Suitable conditions for measuring ionic currents passing through thenanopores are known in the art and examples are provided herein. Themeasurement may be carried out with a voltage applied across themembrane and pore. In some embodiments, the voltage used may range from−400 mV to +400 mV. The voltage used is preferably in a range having alower limit selected from −400 mV, −300 mV, −200 mV, −150 mV, −100 mV,−50 mV, −20 mV, and 0 mV, and an upper limit independently selected from+10 mV, +20 mV, +50 mV, +100 mV, +150 mV, +200 mV, +300 mV, and +400 mV.The voltage used may be more preferably in the range of 100 mV to 240 mVand most preferably in the range of 160 mV to 240 mV. It is possible toincrease discrimination between different nucleotides by a nanoporeusing an increased applied potential. Sequencing nucleic acids using ACwaveforms and tagged nucleotides is described in US Patent PublicationNo. US 2014/0134616 entitled “Nucleic Acid Sequencing Using Tags,” filedon Nov. 6, 2013, which is herein incorporated by reference in itsentirety. In addition to the tagged nucleotides described in US2014/0134616, sequencing can be performed using nucleotide analogs thatlack a sugar or acyclic moiety, e.g., (S)-Glycerol nucleosidetriphosphates (gNTPs) of the five common nucleobases: adenine, cytosine,guanine, uracil, and thymine (Horhota et al., Organic Letters,8:5345-5347 [2006]).

C. Electric Circuit of Nanopore Sequencing Cell

FIG. 4 illustrates an embodiment of an electric circuit 400 (which mayinclude portions of electric circuit 222 in FIG. 2) in a nanopore cell,such as nanopore cell 200. As described above, in some embodiments,electric circuit 400 includes a counter electrode 210 that may be sharedbetween a plurality of nanopore cells or all nanopore cells in ananopore sensor chip, and may therefore also be referred to as a commonelectrode. The common electrode can be configured to apply a commonpotential to the bulk electrolyte (e.g., bulk electrolyte 208) incontact with the lipid bilayer (e.g., lipid bilayer 214) in the nanoporecells by connecting to a alternating voltage source 420 (V_(LIQ)). Insome embodiments, an AC non-Faradaic mode may be utilized to modulatevoltage V_(LIQ) with an AC signal (e.g., a square wave) and apply it tothe bulk electrolyte in contact with the lipid bilayer in the nanoporecell. In some embodiments, V_(LIQ) is a square wave with a magnitude of±200-250 mV and a frequency between, for example, 25 and 400 Hz. Thebulk electrolyte between counter electrode 210 and the lipid bilayer(e.g., lipid bilayer 214) may be modeled by a large capacitor (notshown), such as, for example, 100 μF or larger.

FIG. 4 also shows an electrical model 422 representing the electricalproperties of a working electrode (e.g., working electrode 202) and thelipid bilayer (e.g., lipid bilayer 214). Electrical model 422 includes acapacitor 426 (C_(Bilayer)) that models a capacitance associated withthe lipid bilayer and a resistor 428 (R_(PORE)) that models a variableresistance associated with the nanopore, which can change based on thepresence of a particular tag in the nanopore. Electrical model 422 alsoincludes a capacitor 424 having a double layer capacitance(C_(Double Layer)) and representing the electrical properties of workingelectrode 202 and well 205. Working electrode 202 may be configured toapply a distinct potential independent from the working electrodes inother nanopore cells.

Pass device 406 is a switch that can be used to connect or disconnectthe lipid bilayer and the working electrode from electric circuit 400.Pass device 406 may be controlled by control line 407 to enable ordisable a voltage stimulus to be applied across the lipid bilayer in thenanopore cell. Before lipids are deposited to form the lipid bilayer,the impedance between the two electrodes may be very low because thewell of the nanopore cell is not sealed, and therefore pass device 406may be kept open to avoid a short-circuit condition. Pass device 406 maybe closed after lipid solvent has been deposited to the nanopore cell toseal the well of the nanopore cell.

Electric circuit 400 may further include an on-chip integratingcapacitor 408 (n_(cap)). Integrating capacitor 408 may be pre-charged byusing a reset signal 403 to close switch 401, such that integratingcapacitor 408 is connected to a voltage source V_(PRE) 405. In someembodiments, voltage source V_(PRE) 405 provides a constant referencevoltage with a magnitude of, for example, 900 mV. When switch 401 isclosed, integrating capacitor 408 may be pre-charged to the referencevoltage level of voltage source V_(PRE) 405.

After integrating capacitor 408 is pre-charged, reset signal 403 may beused to open switch 401 such that integrating capacitor 408 isdisconnected from voltage source V_(PRE) 405. At this point, dependingon the level of voltage source V_(LIQ), the potential of counterelectrode 210 may be at a level higher than the potential of workingelectrode 202 (and integrating capacitor 408), or vice versa. Forexample, during a positive phase of a square wave from voltage sourceV_(LIQ) (e.g., the bright or dark period of the AC voltage source signalcycle), the potential of counter electrode 210 is at a level higher thanthe potential of working electrode 202. During a negative phase of thesquare wave from voltage source V_(LIQ) (e.g., the dark or bright periodof the AC voltage source signal cycle), the potential of counterelectrode 210 is at a level lower than the potential of workingelectrode 202. Thus, in some embodiments, integrating capacitor 408 maybe further charged during the bright period from the pre-charged voltagelevel of voltage source V_(PRE) 405 to a higher level, and dischargedduring the dark period to a lower level, due to the potential differencebetween counter electrode 210 and working electrode 202. In otherembodiments, the charging and discharging may occur in dark periods andbright periods, respectively.

Integrating capacitor 408 may be charged or discharged for a fixedperiod of time, depending on the sampling rate of an analog-to-digitalconverter (ADC) 410, which may be higher than 1 kHz, 5 kHz, 10 kHz, 100kHz, or more. For example, with a sampling rate of 1 kHz, integratingcapacitor 408 may be charged/discharged for a period of about 1 ms, andthen the voltage level may be sampled and converted by ADC 410 at theend of the integration period. A particular voltage level wouldcorrespond to a particular tag species in the nanopore, and thuscorrespond to the nucleotide at a current position on the template.

After being sampled by ADC 410, integrating capacitor 408 may bepre-charged again by using reset signal 403 to close switch 401, suchthat integrating capacitor 408 is connected to voltage source V_(PRE)405 again. The steps of pre-charging integrating capacitor 408, waitingfor a fixed period of time for integrating capacitor 408 to charge ordischarge, and sampling and converting the voltage level of integratingcapacitor by ADC 410 can be repeated in cycles throughout the sequencingprocess.

A digital processor 430 can process the ADC output data, e.g., fornormalization, data buffering, data filtering, data compression, datareduction, event extraction, or assembling ADC output data from thearray of nanopore cells into various data frames. In some embodiments,digital processor 430 can perform further downstream processing, such asbase determination. Digital processor 430 can be implemented as hardware(e.g., in a GPU, FPGA, ASIC, etc.) or as a combination of hardware andsoftware.

Accordingly, the voltage signal applied across the nanopore can be usedto detect particular states of the nanopore. One of the possible statesof the nanopore is an open-channel state when a tag-attachedpolyphosphate is absent from the barrel of the nanopore, also referredto herein as the unthreaded state of the nanopore. Another four possiblestates of the nanopore each correspond to a state when one of the fourdifferent types of tag-attached polyphosphate nucleotides (A, T, G, orC) is held in the barrel of the nanopore. Yet another possible state ofthe nanopore is when the lipid bilayer is ruptured.

When the voltage level on integrating capacitor 408 is measured after afixed period of time, the different states of a nanopore may result inmeasurements of different voltage levels. This is because the rate ofthe voltage decay (decrease by discharging or increase by charging) onintegrating capacitor 408 (i.e., the steepness of the slope of a voltageon integrating capacitor 408 versus time plot) depends on the nanoporeresistance (e.g., the resistance of resistor R_(PORE) 428). Moreparticularly, as the resistance associated with the nanopore indifferent states is different due to the molecules' (tags') distinctchemical structures, different corresponding rates of voltage decay maybe observed and may be used to identify the different states of thenanopore. The voltage decay curve may be an exponential curve with an RCtime constant τ=RC, where R is the resistance associated with thenanopore (i.e., R_(PORE) 428) and C is the capacitance associated withthe membrane (i.e., capacitor 426 (C_(Bilayer))) in parallel with R. Atime constant of the nanopore cell can be, for example, about 200-500ms. The decay curve may not fit exactly to an exponential curve due tothe detailed implementation of the bilayer, but the decay curve may besimilar to an exponential curve and is monotonic, thus allowingdetection of tags.

In some embodiments, the resistance associated with the nanopore in anopen-channel state may be in the range of 100 MOhm to 20 GOhm. In someembodiments, the resistance associated with the nanopore in a statewhere a tag is inside the barrel of the nanopore may be within the rangeof 200 MOhm to 40 GOhm. In other embodiments, integrating capacitor 408may be omitted, as the voltage leading to ADC 410 will still vary due tothe voltage decay in electrical model 422.

The rate of the decay of the voltage on integrating capacitor 408 may bedetermined in different ways. As explained above, the rate of thevoltage decay may be determined by measuring a voltage decay during afixed time interval. For example, the voltage on integrating capacitor408 may be first measured by ADC 410 at time t1, and then the voltage ismeasured again by ADC 410 at time t2. The voltage difference is greaterwhen the slope of the voltage on integrating capacitor 408 versus timecurve is steeper, and the voltage difference is smaller when the slopeof the voltage curve is less steep. Thus, the voltage difference may beused as a metric for determining the rate of the decay of the voltage onintegrating capacitor 408, and thus the state of the nanopore cell.

In other embodiments, the rate of the voltage decay can be determined bymeasuring a time duration that is required for a selected amount ofvoltage decay. For example, the time required for the voltage to drop orincrease from a first voltage level V1 to a second voltage level V2 maybe measured. The time required is less when the slope of the voltage vs.time curve is steeper, and the time required is greater when the slopeof the voltage vs. time curve is less steep. Thus, the measured timerequired may be used as a metric for determining the rate of the decayof the voltage on integrating capacitor n_(cap) 408, and thus the stateof the nanopore cell. One skilled in the art will appreciate the variouscircuits that can be used to measure the resistance of the nanopore,e.g., including current measurement techniques.

In some embodiments, electric circuit 400 may not include a pass device(e.g., pass device 406) and an extra capacitor (e.g., integratingcapacitor 408 (n_(cap))) that are fabricated on-chip, therebyfacilitating the reduction in size of the nanopore-based sequencingchip. Due to the thin nature of the membrane (lipid bilayer), thecapacitance associated with the membrane (e.g., capacitor 426(C_(Bilayer))) alone can suffice to create the required RC time constantwithout the need for additional on-chip capacitance. Therefore,capacitor 426 may be used as the integrating capacitor, and may bepre-charged by the voltage signal V_(PRE) and subsequently be dischargedor charged by the voltage signal V_(LIQ). The elimination of the extracapacitor and the pass device that are otherwise fabricated on-chip inthe electric circuit can significantly reduce the footprint of a singlenanopore cell in the nanopore sequencing chip, thereby facilitating thescaling of the nanopore sequencing chip to include more and more cells(e.g., having millions of cells in a nanopore sequencing chip).

D. Data Sampling in Nanopore Cell

To perform sequencing of a nucleic acid, the voltage level ofintegrating capacitor (e.g., integrating capacitor 408 (n_(cap)) orcapacitor 426 (C_(Bilayer))) can be sampled and converted by the ADC(e.g., ADC 410) while a tagged nucleotide is being added to the nucleicacid. The tag of the nucleotide can be pushed into the barrel of thenanopore by the electric field across the nanopore that is appliedthrough the counter electrode and the working electrode, for example,when the applied voltage is such that V_(LIQ) is lower than V_(PRE).

1. Threading

A threading event is when a tagged nucleotide is attached to thetemplate (e.g., nucleic acid fragment), and the tag goes in and out ofthe barrel of the nanopore. This can happen multiple times during athreading event. When the tag is in the barrel of the nanopore, theresistance of the nanopore may be higher, and a lower current may flowthrough the nanopore.

During sequencing, a tag may not be in the nanopore in some AC cycles(referred to as an open-channel state), where the current is the highestbecause of the lower resistance of the nanopore. When a tag is attractedinto the barrel of the nanopore, the nanopore is in a bright mode. Whenthe tag is pushed out of the barrel of the nanopore, the nanopore is ina dark mode.

2. Bright and Dark Period

During an AC cycle, the voltage on integrating capacitor may be sampledmultiple times by the ADC. For example, in one embodiment, an AC voltagesignal is applied across the system at, e.g., about 100 Hz, and anacquisition rate of the ADC can be about 2000 Hz per cell. Thus, therecan be about 20 data points (voltage measurements) captured per AC cycle(cycle of an AC waveform). Data points corresponding to one cycle of theAC waveform may be referred to as a set. In a set of data points for anAC cycle, there may be a subset captured when, for example, V_(LIQ) islower than V_(PRE), which may correspond to a bright mode (period) wherethe tag is forced into the barrel of the nanopore. Another subset maycorrespond to a dark mode (period) where the tag is pushed out of thebarrel of the nanopore by the applied electric field when, for example,V_(LIQ) is higher than V_(PRE).

3. Measured voltages

For each data point, when the switch 401 is opened, the voltage at theintegrating capacitor (e.g., integrating capacitor 408 (n_(cap)) orcapacitor 426 (C_(Bilayer))) will change in a decaying manner as aresult of the charging/discharging by V_(LIQ), e.g., as an increase fromV_(PRE) to V_(LIQ) when V_(LIQ) is higher than V_(PRE) or a decreasefrom V_(PRE) to V_(LIQ) when V_(LIQ) is lower than V_(PRE). The finalvoltage values may deviate from V_(LIQ) as the working electrodecharges. The rate of change of the voltage level on the integratingcapacitor may be governed by the value of the resistance of the bilayer,which may include the nanopore, which may in turn include a molecule(e.g., a tag of a tagged nucleotides) in the nanopore. The voltage levelcan be measured at a predetermined time after switch 401 opens.

Switch 401 may operate at the rate of data acquisition. Switch 401 maybe closed for a relatively short time period between two acquisitions ofdata, typically right after a measurement by the ADC. The switch allowsmultiple data points to be collected during each sub-period (bright ordark) of each AC cycle of V_(LIQ). If switch 401 remains open, thevoltage level on the integrating capacitor, and thus the output value ofthe ADC, would fully decay and stay there. Instead, when switch 401 isclosed, the integrating capacitor is precharged again (to V_(PRE)) andbecomes ready for another measurement. Thus, switch 401 allows multipledata points to be collected for each sub-period (bright or dark) of eachAC cycle. Such multiple measurements can allow higher resolution with afixed ADC (e.g. 8-bit to 14-bit due to the greater number ofmeasurements, which may be averaged). The multiple measurements can alsoprovide kinetic information about the molecule threaded into thenanopore. The timing information may allow the determination of how longa threading takes place. This can also be used in helping to determinewhether multiple nucleotides that are added to the nucleic acid strandare being sequenced.

FIG. 5 shows example data points captured from a nanopore cell duringbright periods and dark periods of AC cycles. In FIG. 5, the change inthe data points is exaggerated for illustration purpose. The voltage(V_(PRE)) applied to the working electrode or the integrating capacitoris at a constant level, such as, for example, 900 mV. A voltage signal510 (V_(LIQ)) applied to the counter electrode of the nanopore cells isan AC signal shown as a rectangular wave, where the duty cycle may beany suitable value, such as less than or equal to 50%, for example,about 40%.

During a bright period 520, voltage signal 510 (V_(LIQ)) applied to thecounter electrode is lower than the voltage V_(PRE) applied to theworking electrode, such that a tag may be forced into the barrel of thenanopore by the electric field caused by the different voltage levelsapplied at the working electrode and the counter electrode (e.g., due tothe charge on the tag and/or flow of the ions). When switch 401 isopened, the voltage at a node before the ADC (e.g., at an integratingcapacitor) will decrease. After a voltage data point is captured (e.g.,after a specified time period), switch 401 may be closed and the voltageat the measurement node will increase back to V_(PRE) again. The processcan repeat to measure multiple voltage data points. In this way,multiple data points may be captured during the bright period.

As shown in FIG. 5, a first data point 522 (also referred to as firstpoint delta (FPD)) in the bright period after a change in the sign ofthe V_(LIQ) signal may be lower than subsequent data points 524. Thismay be because there is no tag in the nanopore (open channel), and thusit has a low resistance and a high discharge rate. In some instances,first data point 522 may exceed the V_(LIQ) level as shown in FIG. 5.This may be caused by the capacitance of the bilayer coupling the signalto the on-chip capacitor. Data points 524 may be captured after athreading event has occurred, i.e., a tag is forced into the barrel ofthe nanopore, where the resistance of the nanopore and thus the rate ofdischarging of the integrating capacitor depends on the particular typeof tag that is forced into the barrel of the nanopore. Data points 524may decrease slightly for each measurement due to charge built up atC_(Double Layer) 424, as mentioned below.

During a dark period 530, voltage signal 510 (V_(LIQ)) applied to thecounter electrode is higher than the voltage (V_(PRE)) applied to theworking electrode, such that any tag would be pushed out of the barrelof the nanopore. When switch 401 is opened, the voltage at themeasurement node increases because the voltage level of voltage signal510 (V_(LIQ)) is higher than V_(PRE). After a voltage data point iscaptured (e.g., after a specified time period), switch 401 may be closedand the voltage at the measurement node will decrease back to V_(PRE)again. The process can repeat to measure multiple voltage data points.Thus, multiple data points may be captured during the dark period,including a first point delta 532 and subsequent data points 534. Asdescribed above, during the dark period, any nucleotide tag is pushedout of the nanopore, and thus minimal information about any nucleotidetag is obtained, besides for use in normalization.

FIG. 5 also shows that during bright period 540, even though voltagesignal 510 (V_(LIQ)) applied to the counter electrode is lower than thevoltage (V_(PRE)) applied to the working electrode, no threading eventoccurs (open-channel). Thus, the resistance of the nanopore is low, andthe rate of discharging of the integrating capacitor is high. As aresult, the captured data points, including a first data point 542 andsubsequent data points 544, show low voltage levels.

The voltage measured during a bright or dark period might be expected tobe about the same for each measurement of a constant resistance of thenanopore (e.g., made during a bright mode of a given AC cycle while onetag is in the nanopore), but this may not be the case when charge buildsup at double layer capacitor 424 (C_(Double Layer)). This chargebuild-up can cause the time constant of the nanopore cell to becomelonger. As a result, the voltage level may be shifted, thereby causingthe measured value to decrease for each data point in a cycle. Thus,within a cycle, the data points may change somewhat from data point toanother data point, as shown in FIG. 5.

4. Determining Bases

For each usable nanopore cell of the nanopore sensor chip, a productionmode can be run to sequence nucleic acids. The ADC output data capturedduring the sequencing can be normalized to provide greater accuracy.Normalization can account for offset effects, such as cycle shape andbaseline shift. After normalization, embodiments can determine clustersof voltages for the threaded channels, where each cluster corresponds toa different tag species, and thus a different nucleotide. The clusterscan be used to determine probabilities of a given voltage correspondingto a given nucleotide. As another example, the clusters can be used todetermine cutoff voltages for discriminating between differentnucleotides (bases).

Further details regarding the sequencing operation can be found in, forexample, U.S. Patent Publication No. 2016/0178577 entitled“Nanopore-Based Sequencing With Varying Voltage Stimulus,” U.S. PatentPublication No. 2016/0178554 entitled “Nanopore-Based Sequencing WithVarying Voltage Stimulus,” U.S. patent application Ser. No. 15/085,700entitled “Non-Destructive Bilayer Monitoring Using Measurement OfBilayer Response To Electrical Stimulus,” and U.S. patent applicationSer. No. 15/085,713 entitled “Electrical Enhancement Of BilayerFormation,” the disclosures of which are incorporated by reference intheir entirety for all purposes.

5. Periodicity of Voltage Values

FIG. 6 shows sample bright and dark period data for a test sequencingrun according to some embodiments. Bright period data are shown on topportion 601 of the figure and the dark period data are shown on thebottom portion 603 of the figure. The periodicity of the voltage data iscaused by an alternating signal provided by an alternating (AC) voltagesource, e.g., AC voltage source 420, as described above in reference toFIG. 4. Each data point shown in FIG. 6 is obtained by an ADCmeasurement of the voltage on a node of the nanopore cell circuit, e.g.,at n_(cap) in FIG. 4, after a certain period of time relative to theopening of pass device 406. For each measurement, the voltage at n_(cap)starts at V_(PRE) (V_(PRE) is shown as dashed line 612) and then decays,approaching +/−V_(LIQ) depending on the period (bright or dark) withinthe AC cycle. After a certain time delay, the ADC measures a voltagevalue. FIG. 6 shows the collection of these measured voltage values,i.e., each data point is a single point sample of the RC decay curvefrom V_(PRE) to V_(LIQ). For the example shown in FIG. 6, the dataacquisition rate is about 1,976 Hz. Within each period, the variation involtage from point-to-point is caused, in part, by charge buildup in thecell, leading to an overall shift in the underlying voltage decay curvefor the charging/discharging of the integrating capacitor (e.g.,capacitor 408 or capacitor 426, depending on the circuit used).

FIG. 6 shows data from an open channel state of the bright mode, e.g.,bright mode data 620 that precedes a threading event 610 that appearsshortly after the start of the bright period of the 7^(th) AC cycle.Subsequent open channel values and threading events in other AC cyclesare also shown as time progresses. In some embodiments, as shown here,the measured ADC values in the bright periods are actually fairlyrepeatable from cycle to cycle for both threaded and open channelstates. This opens up the possibility that the systematic offsets andnoise in one bright period's data may be compensated using an adjacent(or even a subsequent non-adjacent) bright period's data, without theneed to use dark channel data. The following section details one or moreembodiments that make use of the periodicity of the voltage data.

III. Period-to-Period Analysis

Embodiments take advantage of the periodicity in the data, e.g., asshown in FIG. 6 above. Embodiments can use few assumptions or parametersand thus be more robust. In this manner, embodiments can be used withnew systems that have a new nanopore, lipid bilayer, etc., where minimalcharacterization of the new cell has been done. Thus, embodiments can bebroadly applicable without having much prior knowledge of the sequencingcell being used.

A. Determining a Difference Signal

To determine a difference signal, one cycle of data can be subtractedfrom another cycle of data. In some embodiments, corresponding datapoints originate from a neighboring cycle (e.g., nearest neighbor,second neighbor, etc.).

FIG. 7 shows sets of data points of multiple cycles 1-4 with each cyclehaving a respective bright and dark period, denoted by B and D labels.The determination of difference data according to some proceeds asfollows. Raw ADC data (not shown) is used to create two shifted datasets, which each may be stored in memory. Signal 710 is the raw datashifted by one-half period to the left (referred to herein as left_adc)and data 720 is the raw data shifted one-half period to the right(referred to herein as right_adc). While this embodiment shows anexample of a net one-period shift, other shifts are possible withoutdeparting from the scope of the present disclosure, e.g., two-periodshifts, three-period shifts, etc. Further, the raw data can be used,along with shifted data that is shifted one full period, as opposed toshifting twice at a half-period. The processed difference data 730(referred to herein as p2p_diff) can then be created by subtracting thetwo shifted adc-signals, in this case:

p2p_diff=left_adc−right_adc.

In some embodiments, the first cycle of processed difference data 730(p2p_diff) is obtained by subtracting raw cycle 2 from raw cycle 1. Thesecond cycle of processed difference data 730 (p2p_diff) is obtained bysubtracting raw cycle 3 from raw cycle 2. The third cycle of processeddifference data 730 (p2p_diff) is obtained by subtracting raw cycle 3from raw cycle 4 and so on. In the processed difference data 730, thesingle threading event 770 from the raw data is duplicated, firstappearing as a positive peak (event peak 750) and subsequently appearingagain as a negative peak (event peak 760).

One of ordinary skill will appreciate that event peaks 750 and 760 aregenerally of opposite sign and thus, the positive and negativequalifiers are used herein as merely one example. The positive andnegative peaks for this single threading event are separated in time byan amount that equals the net time shift between the two shifted datasets (one full period in this example). However, the net time shift maybe longer for threading events that persist for multiple cycles.

While FIG. 7 shows a point-wise period-to-period differencing method tocompute the processed difference data 730, any differencing scheme maybe used without departing from the scope of the present disclosure,e.g., shifts may be in either direction (right-to-left or left-to-right)by a single or multiple of a period. FIG. 7 shows a nearest-neighbordifference by (net) shifting left-to-right by a single period. However,differences can be taken by shifting by a multiple of periods which canbe provide more coarse scale information about the underlying signal.

A difference for the first cycle and/or last cycle may not be determinedbecause there may be no first cycle data or last cycle data for one ofthe shifted cycles. Accordingly, these regions are referred to herein as“invalid regions.” An example of a first invalid region 740 is shown inFIG. 7.

FIG. 8 illustrates an embodiment of the differencing technique, usingthe same shift method described in FIG. 7 above, but this time havingraw data of a slightly different nature. As shown here, in someembodiments, a threading event may last more than one AC cycle. Forexample, in the data of FIG. 8, the threading event lasts three cycles,shown by threading events 810, 820, and 830 each occurring during cycles2, 3, and 4, respectively. However, due to the repeatability in the rawdata for each of these threading events, only the first threading event810 and the last threading event 830 may appear in the processeddifference data as positive peak 840 and negative peak 850,respectively. One of ordinary skill will appreciate that peaks 840 and850 are generally of opposite sign and thus, the positive and negativequalifiers are used herein as merely one example. In addition, the timeseparation between the positive peak and negative peaks are no longermerely the net time shift applied to the raw data (again, 1 period inthis example), but is the sum of the net time shift and the duration oftime between first threading event 810 and the last threading event 830.

FIG. 9 shows a schematic diagram to illustrate a method of shifting rawdata points for determining difference data according to someembodiments. The first row shows the original data set, with eachpositive channel marked with a number. The negative channel is not usedin this example and is therefore grayed out with diagonal hatching. Thesecond row shows a one cycle right-shift of the data. The third rowshows a one cycle left-shift of the data. The fourth row shows anexample difference signal, where the difference is the original dataminus the right shifted signal, such that the earlier cycle issubtracted from the later cycle.

The fifth row shows the difference data of the original data minus theleft shifted signal. In this example, the later cycle is subtracted fromthe earlier cycle. Thus, a threading event would start with a positivepulse.

The last row shows a difference between the right-shifted data and theleft-shifted data. In some embodiments, a nearest neighbor difference(e.g., 1−2) and a second nearest neighbor difference (e.g., 1−3) can beused to determine whether the difference is due to one having athreading event or two having a threading event. In some embodiments,this can help to confirm that two chips provide the same results.

B. Difference Data

FIG. 10 shows an example of difference data for a threading eventaccording to some embodiments. Specifically, FIG. 10 shows a differencedata 1010 measured over time as would be produced, e.g., by digitalprocessor 430 during operation of the sequencer. Thus, this embodimentmay not have access to the underlying raw AC signals (e.g. the signalsshown in FIG. 6). Nevertheless, the information necessary to determine abase call may be extracted from difference data 1010 alone.

As already described above, the positive pulse 1020 represents athreading event that that begins at time t₁. In some embodiments, thethreading event may occur slightly after the bright mode begins at timet₀. For example, as described above in reference to FIG. 5, t₀ indicatesthe moment when V_(LIQ) switches from being smaller than V_(PRE) tobeing larger than V_(PRE) (or vise versa depending on the architectureof the cell), and thus the precise time at which the direction of theelectric field across the pore switched. The time difference (t₁−t₀)provides useful information and, in some embodiments, can be interpretedas the time-to-thread (TTT), i.e., the time it took the tag to beinserted into the nanopore following a dark mode. The TTT can be helpfulin design of the system, e.g., in creating nanopore molecules. Forexample, TTT is a useful design parameter when choosing pores and/ormutating pores for new applications, e.g., to create a new pore with anenhanced (faster) TTT.

In some embodiments, the amplitude L_(EVENT) of the threading pulse(also referred to herein as the “voltage level” or “level”) correspondsto a particular tag involved in the threading event, with 4 different(and distinguishable) amplitudes occurring for the open channel(unthreaded channel) and the 4 bases (A, G, C, and T). The widthdt_(intra-cycle) of the pulse corresponds to the duration over which thethreading event occurred, the so-called “dwell time” (assuming theduration is within a single bright cycle). However, once threaded infirst cycle, a tag may typically thread for many cycles after (e.g.,100), and thus the dwell time may extend across many cycles.Accordingly, the dwell time may be more appropriately interpreted to bethe intra-cycle dwell time is dt_(intra-cycle). As shown in FIG. 10, theintra-cycle dwell time is dt_(intra-cycle) and can be measured by thewidth of the first peak in the difference data for a given threadingevent.

The inter-cycle dwell time is shown as dt_(inter-cycle) and can bemeasured from the difference data as the time difference from thestarting positive pulse and the ending negative pulse, e.g., timedifference between first (earliest time) edge of the first pulse and thelast edge (latest in time) of the last pulse. Other edges can be used asendpoints without departing from the scope of the present disclosure.

FIG. 8 above discussed a multi-cycle threading event, where the tagthreaded on three consecutive cycles. As discussed in FIG. 8, in amulti-cycle threading event, the difference between the voltage pointsin the middle cycles may typically be zero. However, the subtraction maynot be perfect, e.g., TTT for each threading event may differ slightly,leading to a few points between the positive and negative pulses havinga level near L_(EVENT). An example of this phenomenon is shown in thesample difference data points 1201 of FIG. 12 below. Such a variationcan be distinguished from new threading events based on duration of timebecause these intra-threading event variations may show up in the dataover only one or two ADC cycles. In this manner, the occasional straypoint can be assumed to be part of the same threading event signaled bythe first large positive pulse.

The peak value of any peaks seen in the data, e.g., noise peak 1030, canbe compared to the peak values of the larger threading event pulses todetermine whether or not the peaks are noise or a real threading event.As already mentioned above, slow moving changes to the signal, e.g.,gain drift from one cycle to another, would likely be consistent fromone cycle to another (basically showing up as a DC offset) and can beremoved and/or minimized during the differencing process. Thus, theperiod-to-period differencing method described herein can provide athreading event detector that is generally insensitive to changes thathappen at long time scales and/or variations in the data that arerepeatable from cycle to cycle. In addition, the baseline of thedifference data need not be precisely zero, or even known in advance,because the baseline can be identified from a statistical analysis ofthe difference data itself, as described below.

In some embodiments, the rate of occurrence of threading events is slow(one or two per second) compared to the sampling rate, e.g., 2 kHz.Accordingly, the zero value can be determined as the most common value(mode) or average value the difference signal. The noise can also bemeasured (e.g., 1.8 ADC levels) based on, e.g., the variance of the datawithout threading events (open channel data). The more cells thatcontribute to the difference signal, the smaller the noise can become,e.g., 1.0 ADC level. Based on the measured noise level, a threshold(+/−T_(noise) shown in FIG. 10) for identifying a threading event can bechosen, e.g., the threshold can be chosen based on the standarddeviation of the noise, e.g., only data greater than 6 standarddeviations outside the nose may be registered as threading events.Further, the width dt_(ultra-cycle) can be required to be at least acertain number of points, e.g., 3, 4, 5, or 6. Thus, the threadingvoltage can be required to be seen at least a certain number of times ina cycle, which can reduce noise due to spurious voltage measurements.

C. Calculating Difference Data

FIG. 11 is a flow chart illustrating an example method of using asequencing cell, according to certain embodiments. More specifically,FIG. 11 illustrates a method of period-to-period analysis of AC signalsfrom nanopore sequencing according to some embodiments.

In step 1110, an alternating signal (also referred to herein as an “ACsignal”) is applied across a nanopore of the sequencing cell. Such an ACsignal may be a square wave provided by an AC signal generator, similarto AC voltage source 420 (also referred to herein as an AC “signalgenerator”) described above in reference to FIG. 4. In some embodiments,the AC signal may be multiple cycles long with each cycle of thealternating signal comprising a first portion (also referred to hereinas the “bright mode” or “bright period”) and a second portion (alsoreferred to herein as the “dark mode” or “dark period”). The voltagelevels of the second portion are opposite of a reference voltage than avoltage levels of the first portion (V_(LIQ) is either above or belowV_(PRE) in the embodiment shown in FIG. 5). As described above inreference to FIGS. 1-2, in some embodiments, the nanopore is configuredto receive a tag that is connected to a nucleotide thereby creating athreading event.

In step 1120, a first set of voltage data (also referred to herein as“unshifted voltage data” or “raw voltage data”) is acquired, e.g., byADC 410, as described above in reference to FIG. 4. In some embodiments,the first set of voltage data is acquired during the first portion(e.g., the bright period) of the multiple cycles of the alternatingsignal. Examples of the first set of voltage data include the datapoints shown in bright period 520 of FIG. 5 and also all pointscharacterized as within a “B” period, as shown in FIGS. 7-8. As shown inFIG. 7-8, the first set of voltage data can include voltage data pointsacquired over multiple cycles of the AC signal. As described above,voltage data corresponds to (i.e. depends on) a value of a resistance ofthe nanopore at a different time, where the resistance of the nanoporechanges when the tag is received within the nanopore.

In step 1130, a time-shifted set of voltage data is determine from theacquired raw voltage data e.g., by digital processor 430 shown above inFIG. 4. Examples of shifted data are shown in FIGS. 7-9, as discussedabove. In some embodiments, each cycle of data points of the raw set ofvoltage data and the shifted set of voltage data includes a specifiednumber of data points, the raw unshifted data may include 15 data pointswithin a bright period and the shifted data may include a corresponding15 data points within a bright period. Because the shifted data istime-shifted relative to the unshifted data the data points of theshifted data and the data points of the unshifted data are fromdifferent cycles of the AC signal, as discussed above, e.g. in referenceto FIGS. 7-9 above. For example, turning briefly to FIG. 9, theunshifted data points may originate from cycle 1 of the unshifted dataand the shifted data points may originate from unshifted cycle 2 asshown in the example labeled Org-LShift.

In step 1140, difference data is computed, e.g., by digital processor430 shown above in FIG. 4, by computing differences between data pointsof the unshifted set of voltage data and corresponding data points ofthe shifted set of voltage data. In some examples, the correspondingdata points have the same position in a respective cycle but may bepresent in different cycles. For example, for unshifted data points thatoriginate from cycle 1 of the unshifted data and shifted data pointsoriginate from cycle 2 of the unshifted data, difference data may becomputed in the following manner: the first difference data point may becomputed by subtracting the first point from cycle 1 from the firstpoint of cycle two, the second difference data point may be computed bysubtracting the second point from cycle 1 from the second point of cycle2, and so on. One of ordinary skill having the benefit of thisdisclosure will appreciate that there are many different ways to performthe difference, and the single point method described above is meant asmerely one example among many. For example, multiple data points fromeach cycle may be averaged or filtered before subtraction or differencesmay be computed based on nearest neighbor subtractions, next nearestneighbor subtractions or the like without departing form the scope ofthe present disclosure.

In step 1150, a threading event is detected (i.e., a tag has beenreceived within the nanopore) based on one or more data points in thedifference data. For example, a threading event may be identified by thepresence of a pulse in the difference data as described above inreference to FIG. 10. In some embodiments, threading events aredetermined by identifying an open channel level as the mode of thedifference data. Threading events may be identified by determining astarting pulse of the difference data exceeds a threshold value,determining an ending pulse in the difference data that follows thestarting pulse and is of an opposite sign from a sign of the startingpulse, and determining a time difference between the starting pulse andthe ending pulse. In some embodiments, the threshold value may bedetermined from the mode of the difference data, as described above inreference to FIG. 10.

In some embodiments, for a given cell, the unshifted data (i.e. the rawdata) can effectively be copied and the copy stored in memory. Thedifference between the corresponding voltage points of neighboringcycles is then computed using the stored unshifted data and the storedcopy of the unshifted data or a time-shifted copy of the unshifted data.Equivalently, a single copy of the unshifted data can be used with thepoints of a current pair of cycles being read by a processor with adifference taken between the two.

In the data processor cache, there can be stored locations, with a givencycle stored in one location and used for every two cycles. For onecalculation, the cycle data points would correspond to the initial cyclefor determining the difference. For a next calculation, the cycle datapoints would correspond to the ending cycle for determining thedifference. In this manner, the number of operations can be reducedrelative to having two copies of the entire array and reading two setsof cycle data points for every difference calculation.

For example, a first set of data points can be stored in a first memorylocation and a second set of data points can be stored in a secondmemory location . A difference can be taken of the set of data points atthe first memory location minus the set of data points at the secondmemory location . Then, for next calculation, the first set of datapoints can be removed from memory, a third set of data points can bestored in the first memory location. The next difference taken will beof the second set of data points at the second memory location minus thethird set of data points at the first memory location and so on.

While the above differencing method is described in the context of adigital signal processing technique, the present disclosure is not solimited. For example, analog techniques may be employed instead of, orin combination with, digital techniques without departing from the scopeof the present disclosure. For example, the time shifting anddifferencing computation may be performed by one or more analog circuitelements, e.g. phase shifters, operational amplifiers, analog filters,or the like.

D. Data Compression and Performance

Computation of several properties of the pulses can allow compression ofdata, since the physical information needed for base calling may beextracted from a small set of properties of the pulses, with this set ofproperties being stored as a few parameters in memory. Chips can belarge, e.g., hundreds of thousands of cells or even millions, at asampling rate of 2 kHz. Thus, in the case where every voltage point isstored, terabytes of data can be produced and associated storage costscan be high. In some embodiments, data compression through thecomputation of threading event parameters based on difference data canuse only a handful of parameters to be stored per threading event,thereby greatly reducing storage requirements and cost.

Examples of the parameters stored for a threading event can include theone or more level(s) L_(EVENT), the time to thread TTT, and the varioustime differences, e.g., the time difference between the starting pulseand the ending pulse. As described above in reference to FIG. 10, theintra-cycle dwell time and inter-cycle dwell time can be computed andstored as compressed data.

In some embodiments, the difference data processed by the methoddisclosed herein includes a few other beneficial features. First, allnon-threaded (i.e., open channel) data has a low amplitude, with openchannel values for the difference data clustered around zero (or aroundsome offset). On the other hand, threading events appear as abrupt steps(also referred to herein as pulses and/or peaks) whose amplitudes riseabove the background noise level, e.g., event peak 750 shown in FIG. 7.This leads to the possibility for data compression. In some embodiments,and as discussed in further detail above in reference to FIGS. 10-11, tocompress the data, only the minimum data necessary to characterize thepeaks can be stored, with all other data being discarded. Second,because long term systematic drifts in the data (such as gain driftand/or offset shift) occur with a similar magnitude for two cycles thatare relatively close in time, subtracting data from different cycles, asdone here, corrects the data by removing these types of systematiceffects. More generally, the processed data can be relatively immune toany systematic shifts in the data that 1) occur on a timescale that islong compared to the AC period; and/or 2) that occur in a repeatablemanner from cycle to cycle.

Another benefit of the method disclosed herein relates to improvedprocessing speed. Since the proposed method is based on differences inADC values (1-byte data), it is fast if data is properly memory aligned.In addition, it can be easily vectorized per cell and parallelized formultiple cells. A throughput on the order of 100 MB/s can be expectedfor a single processor doing analysis on ubf files. In principle, onecan skip ubf files by having data directly accessible (e.g., in a ringbuffer) for processing. Thus, some embodiments of the method and systemmay be used for real-time processing, e.g., using FPGA. Employment ofthe method disclosed herein can also obviate need of using intermediateformats like hdf5 for processing. In addition, since the method isnaturally adaptive and based on local neighborhood it may not requireextensive background processing.

IV. Results

FIGS. 12-14 show raw sequencing data and processed difference dataaccording to some embodiments.

FIG. 12A shows a relatively short timescale (1 s total duration) plot ofraw ADC test data showing individual AC cycles along with enoughresolution to show individual bright and dark periods within each ACcycle. Several of the bright periods show threading events, e.g.,threading events 1203. Each of the threading events has approximatelythe same ADC level, approximately 110, indicating that the same tag isbeing threaded into the nanopore in each event. There are severalreasons why this may occur, e.g., it may take several cycles to catalyzethe base associated with the tag or multiple tagged bases of the sametype may repeat in the nucleic acid being sequenced.

FIG. 12B shows difference data (p2p_diff) that has been computed fromthe raw (unshifted) ADC data shown in FIG. 12A, according to someembodiments. As described above in reference to FIGS. 6-9, in thedifference data, each threading event is represented by a pair of eventpulses having opposite signs. In addition, for multi-cycle threadingevents, the differencing process may not be perfect, e.g., if the TTTvaries slightly from cycle to cycle, as discussed above in reference toFIG. 10. In this case, a few sample difference data points 1201 betweenthe positive and negative pulses have a level that is comparable to thethreading amplitude. However, these stray data points can bedifferentiated from threading events based on the timescales involved,i.e., they are much faster than real threading events.

FIGS. 13A and 13B show the results of the period-to-period differencingmethod, according to some embodiments. FIGS. 13A and 13B show raw ADCdata and processed difference data over a duration of about 150 s andthus many more threading events can be seen as compared to FIGS. 12A and12B. Furthermore, the raw ADC data shown in FIG. 13A also evidences atime-varying gain drift in both the bright mode and dark mode data. Thisdrift leads to the unprocessed data being error prone. However, as shownin FIG. 13B, because the drift is relatively stable from cycle to cycle(individual cycles are too fast to be seen here) the drift iseffectively removed in the difference data. Furthermore, the differencedata in FIG. 13B that is below a threshold value is deemed to be noiseand is removed. FIG. 13B thus demonstrates that this data is notnecessary to detect the threading events and can be removed to reduceoverall data storage requirements.

FIGS. 14A and 14B show another set of sample data that is data similarto that shown in FIGS. 13A and 13B. In the raw ADC data shown in FIG.14A, a phenomena known as baseline shift is observed. In someembodiments, this phenomena may be caused by a cell's charge balancebeing abruptly brought out of equilibrium each time a threading eventoccurs. As a result, during a threading event, both the bright mode andthe dark mode data trend upwards, and then begin to trend back downwardsover time as the charges on capacitive elements in the cell redistributeto reach an equilibrium state. FIG. 14B shows that the differencingmethod disclosed herein is able to effectively correct for this offsetshift. As in FIGS. 13A-13B, the difference data below a certainthreshold is removed to reduce overall data storage requirements.

In some embodiments, the difference data can be used to conduct apreliminary analysis of the data to identify threading events. Laterprocessing can involve classifying of events as a particular base calland then alignment. Accordingly, embodiments can obtain a signal fromone or more sequencing cells, detect events, classify events to formbasecalls, put bases in a sequence, and align to a reference genome.

Some experimental parameters of the system can be amount of salt, typeof salt, amount of voltage at pore, type of nanopore, duty cycle ofbright and dark modes, and frequency of AC signal and data acquisitionrate. Embodiments can be agnostic to these different experimentalparameters. Other benefits are speed of operation, ability to programinto simple hardware (e.g., FPGA as opposed to GPU, or on basic CPU),and data reduction (less memory).

V. Computer System

Any of the computer systems mentioned herein may utilize any suitablenumber of subsystems. Examples of such subsystems are shown in FIG. 15in computer system 10. In some embodiments, a computer system includes asingle computer apparatus, where the subsystems can be the components ofthe computer apparatus. In other embodiments, a computer system caninclude multiple computer apparatuses, each being a subsystem, withinternal components. A computer system can include desktop and laptopcomputers, tablets, mobile phones and other mobile devices.

The subsystems shown in FIG. 15 are interconnected via a system bus1575. Additional subsystems such as a printer 1574, keyboard 1578,storage device(s) 1579, monitor 1576, which is coupled to displayadapter 1582, and others are shown. Peripherals and input/output (I/O)devices, which couple to I/O controller 1571, can be connected to thecomputer system by any number of means known in the art such asinput/output (I/O) port 1577 (e.g., USB, FireWire®). For example, I/Oport 1577 or external interface 81 (e.g. Ethernet, Wi-Fi, etc.) can beused to connect computer system 1510 to a wide area network such as theInternet, a mouse input device, or a scanner. The interconnection viasystem bus 1575 allows the central processor 1573 to communicate witheach subsystem and to control the execution of a plurality ofinstructions from system memory 1572 or the storage device(s) 1579(e.g., a fixed disk, such as a hard drive, or optical disk), as well asthe exchange of information between subsystems. The system memory 1572and/or the storage device(s) 1579 may embody a computer readable medium.Another subsystem is a data collection device 1585, such as a camera,microphone, accelerometer, and the like. Any of the data mentionedherein can be output from one component to another component and can beoutput to the user.

A computer system can include a plurality of the same components orsubsystems, e.g., connected together by external interface 1581 or by aninternal interface. In some embodiments, computer systems, subsystem, orapparatuses can communicate over a network. In such instances, onecomputer can be considered a client and another computer a server, whereeach can be part of a same computer system. A client and a server caneach include multiple systems, subsystems, or components.

Aspects of embodiments can be implemented in the form of control logicusing hardware (e.g. an application specific integrated circuit or fieldprogrammable gate array) and/or using computer software with a generallyprogrammable processor in a modular or integrated manner. As usedherein, a processor includes a single-core processor, multi-coreprocessor on a same integrated chip, or multiple processing units on asingle circuit board or networked. Based on the disclosure and teachingsprovided herein, a person of ordinary skill in the art will know andappreciate other ways and/or methods to implement embodiments of thepresent invention using hardware and a combination of hardware andsoftware.

Any of the software components or functions described in thisapplication may be implemented as software code to be executed by aprocessor using any suitable computer language such as, for example,Java, C, C++, C#, Objective-C, Swift, or scripting language such as Perlor Python using, for example, conventional or object-orientedtechniques. The software code may be stored as a series of instructionsor commands on a computer readable medium for storage and/ortransmission. A suitable non-transitory computer readable medium caninclude random access memory (RAM), a read only memory (ROM), a magneticmedium such as a hard-drive or a floppy disk, or an optical medium suchas a compact disk (CD) or DVD (digital versatile disk), flash memory,and the like. The computer readable medium may be any combination ofsuch storage or transmission devices.

Such programs may also be encoded and transmitted using carrier signalsadapted for transmission via wired, optical, and/or wireless networksconforming to a variety of protocols, including the Internet. As such, acomputer readable medium may be created using a data signal encoded withsuch programs. Computer readable media encoded with the program code maybe packaged with a compatible device or provided separately from otherdevices (e.g., via Internet download). Any such computer readable mediummay reside on or within a single computer product (e.g. a hard drive, aCD, or an entire computer system), and may be present on or withindifferent computer products within a system or network. A computersystem may include a monitor, printer, or other suitable display forproviding any of the results mentioned herein to a user.

Any of the methods described herein may be totally or partiallyperformed with a computer system including one or more processors, whichcan be configured to perform the steps. Thus, embodiments can bedirected to computer systems configured to perform the steps of any ofthe methods described herein, potentially with different componentsperforming a respective steps or a respective group of steps. Althoughpresented as numbered steps, steps of methods herein can be performed ata same time or in a different order. Additionally, portions of thesesteps may be used with portions of other steps from other methods. Also,all or portions of a step may be optional. Additionally, any of thesteps of any of the methods can be performed with modules, units,circuits, or other means for performing these steps.

The specific details of particular embodiments may be combined in anysuitable manner without departing from the spirit and scope ofembodiments of the invention. However, other embodiments of theinvention may be directed to specific embodiments relating to eachindividual aspect, or specific combinations of these individual aspects.

The above description of example embodiments of the invention has beenpresented for the purposes of illustration and description. It is notintended to be exhaustive or to limit the invention to the precise formdescribed, and many modifications and variations are possible in lightof the teaching above.

A recitation of “a”, “an” or “the” is intended to mean “one or more”unless specifically indicated to the contrary. The use of “or” isintended to mean an “inclusive or,” and not an “exclusive or” unlessspecifically indicated to the contrary. Reference to a “first” componentdoes not necessarily require that a second component be provided.Moreover reference to a “first” or a “second” component does not limitthe referenced component to a particular location unless expresslystated.

All patents, patent applications, publications, and descriptionsmentioned herein are incorporated by reference in their entirety for allpurposes. None is admitted to be prior art.

What is claimed is:
 1. A system comprising: a sequencing cell configuredto support a nanopore disposed in a membrane, the nanopore configured toreceive a molecule, thereby creating a threading event; a signalgenerator configured to apply an alternating signal across the nanoporeof the sequencing cell, the alternating signal comprising cycles, eachof the cycles of the alternating signal comprising a first portion and asecond portion, wherein voltage levels of the second portion aredifferent than voltage levels of the first portion; a measurementcircuit that acquires a first set of voltage data during the firstportion of a plurality of the cycles of the alternating signal, thevoltage data comprising data points, wherein each of the data points ofthe first set of voltage data corresponds to a value of a resistance ofthe nanopore at a different time, where the resistance of the nanoporechanges when the molecule is received within the nanopore; and aprocessor configured to: determine a shifted set of voltage data fromthe first set of voltage data, wherein each cycle of data points of thefirst set of voltage data and the shifted set of voltage data includes aspecified number of data points; compute difference data by computingdifferences between the data points of the first set of voltage data andcorresponding data points of the shifted set of voltage data; andidentify the threading event based on one or more data points in thedifference data.
 2. The system of claim 1, further comprising a switchthat selectably applies a voltage across the nanopore, wherein theswitch is configured to open and close a plurality of times during thefirst portion of the alternating signal.
 3. The system of claim 1,wherein the shifted set of voltage data comprises data points selectedfrom stored data representing the first set of voltage data.
 4. Thesystem of claim 3, wherein the data points of the shifted set of voltagedata are selected from a time-shifted copy of the first set of voltagedata, and wherein_computing the difference data further comprisescomputing a difference between corresponding data points of the firstset of voltage data and the time-shifted copy of the first set ofvoltage data.
 5. The system of claim 1, wherein the processor is furtherconfigured to determine the shifted set of voltage data by: selecting afirst subset of the first set of voltage data; and selecting a secondsubset of the first set of voltage data, wherein the cycles of thesecond subset of the first set of voltage data are different cycles fromthe cycles of the first subset of the first set of voltage data.
 6. Thesystem of claim 5, wherein the corresponding data points are selectedfrom the first subset of the first set of voltage data and from thesecond subset of the first set of voltage data.
 7. The system of claim1, wherein the processor is further configured to determine the shiftedset of voltage data by: generating a copy of the first set of voltagedata; generating the shifted set of voltage data by applying a timeshift to the copy of the first set of voltage data; and storing both thefirst set of voltage data and the shifted set of voltage data in memory.8. The system of claim 1, wherein the processor is further configuredto: identify a mode of the difference data; determine a starting pulseof the difference data that exceeds a threshold value, wherein thethreshold value is determined from the mode; determine an ending pulsein the difference data that follows the starting pulse and is of anopposite sign; determine a time difference between the starting pulseand the ending pulse; and store the time difference and an amplitude ofthe starting pulse and/or ending pulse in memory.
 9. The system of claim1, further comprising: a sequencing chip that includes a plurality ofsequencing cells, wherein the processor is further configured to:analyze a plurality of sets of difference data, wherein each one of theplurality of sets of difference data comes from a respective one of theplurality of sequencing cells.
 10. The system of claim 9, wherein theprocessor is further configured to cluster values of the plurality ofsets of difference data to determine cutoff values for determining abase call based on levels of pulses in the difference data.
 11. A methodof using a sequencing cell, the method comprising: applying analternating signal across a nanopore of the sequencing cell, thenanopore configured to receive a molecule, thereby creating a threadingevent, the alternating signal comprising cycles, each cycle of thealternating signal comprising a first portion and a second portion,wherein voltage levels of the second portion are different than voltagelevels of the first portion; acquiring a first set of voltage dataduring the first portion of a plurality of the cycles of the alternatingsignal, wherein each data point of the first set of voltage datacorresponds to a value of a resistance of the nanopore at a differenttime, where the resistance of the nanopore changes when the molecule isreceived within the nanopore; determining a shifted set of voltage datafrom the first set of voltage data, wherein each cycle of data points ofthe first set of voltage data and the shifted set of voltage dataincludes a specified number of data points; computing difference data bycomputing differences between the data points of the first set ofvoltage data and corresponding data points of the shifted set of voltagedata; and identifying the threading event based on one or more datapoints in the difference data.
 12. The method of claim 11, wherein thecorresponding data points have a same position in different cycles. 13.The method of claim 11, wherein the corresponding data points of theshifted set of voltage data are selected from the first set of voltagedata, wherein the data points of the shifted data are selected fromdifferent cycles than the cycles for the specified number of datapoints.
 14. The method of claim 11, wherein the shifted set of voltagedata comprises data points selected from a time-shifted copy of thefirst set of voltage data, and wherein computing the difference datacomprises computing a difference between corresponding data points ofthe first set of voltage data and the time-shifted copy of the first setof voltage data.
 15. The method of claim 11, wherein determining theshifted set of voltage data further comprises: selecting a first subsetof the first set of voltage data; and selecting a second subset of thefirst set of voltage data, wherein the cycles of the second subset ofthe first set of voltage data are different cycles from the cycles ofthe first subset of the first set of voltage data.
 16. The method ofclaim 15, wherein the corresponding data points are selected from thefirst subset of the first set of voltage data and from the second subsetof the first set of voltage data.
 17. The method of claim 11, whereindetermining the shifted set of voltage data further comprises:generating a copy of the first set of voltage data; generating theshifted set of voltage data by applying a time shift to the copy of thefirst set of voltage data; and storing both the first set of voltagedata and the shifted set of voltage data in memory.
 18. The method ofclaim 11, further comprising: identifying a mode of the difference data;and determining a starting pulse of the difference data that exceeds athreshold value, wherein the threshold value is determined from themode.
 19. The method of claim 18, further comprising: determining anending pulse in the difference data that follows the starting pulse andis of an opposite sign from a sign of the starting pulse; determining atime difference between the starting pulse and the ending pulse; andstore the time difference and an amplitude of the starting pulse and/orending pulse in memory.
 20. The method of claim 11, further comprising:analyzing a plurality of sets of difference data, wherein each one ofthe plurality of sets of difference data comes from a respective one ofa plurality of sequencing cells of a sequencing chip; and clustering aplurality of values of the plurality of sets of difference data todetermine cutoff values for determining a base call based on levels ofpulses in the difference data.